/********************************************************************************
Discrimination in Multi-Phase Systems: Evidence from Child Protection

Created on: 12/28/2022
Last Modified on: 2/13/2024

Description: This program generates the investigator analysis sample, at the 
child x investigation level.

Note that we have removed the file directory names from this program for 
confidentiality reasons. 
********************************************************************************/

**************************
**(0) SETUP
**************************
clear
set more off
macro drop all
capture log close
set seed 02042023

*Set directories  
global cleandata 
global tmpdata 
global output


**************************
**GENERATE SAMPLE AND REMAINING VARIABLES NEEDED FOR SUBSEQUENT ANALYSES
**************************
use "${cleandata}analysis_sample_investigators_qje.dta", clear 

*Gen additional variables for summary statistics table: 
*alleged perpetrator type, neglect, and number of children in the allegation 
gen parent = mom==1 | dad==1 | parent_
gen rel2 = notrel==0 & parent==0 
gen neglect = phyneg==1 | medneg==1 | failprot==1 | impsup ==1 

cap drop _merge 
merge 1:1 vicid inv_caseid using "${tmpdata}num_children_2008_2019", keepus(num_children) keep(1 3)

*Generate missing indicators 
foreach x in white female age_inv phyab neglect maltreat parent rel notrel num_children {
	gen miss_`x' = `x'==.
	replace `x'=0 if `x'==.
}

*Generate a leave-one-out measure of investigator stringency, following the approach described in Arnold Dobbie Yang (QJE, 2018)
cap drop n
cap drop n_leaveout 
cap drop fc_sum 
cap drop fc_leaveout 
cap drop z 
cap drop fchat 

bysort worker_id white: gen n = _N
gen n_leaveout = n - 1
order n_leaveout

reghdfe fc, absorb(rotationgroup) resid
predict fchat, resid
bysort worker_id white: egen fc_sum = total(fchat)
order fc_sum

gen fc_leaveout = fc_sum - fchat
gen z = fc_leaveout/n_leaveout

*Label variables 
label var white "White"
label var female "Female"
label var age_inv "Age during investigation"
label var phyab "Physical abuse allegation"
label var neglect "Neglect allegation"
label var maltreat "Maltreatment allegation"
label var num_children "Number of children in allegation"
label var parent "Alleged perpetrator includes the parent/step-parent"
label var rel "Alleged perpetrator includes a non-parent relative"
label var rel2 "Alleged perpetrator includes a non-parent relative"
label var notrel "Alleged perpetrator includes someone unrelated"
label var inv6m "Re-investigated within 6 months"
label var fc "Foster Care Placement Rate"


save "${cleandata}inv_analysis_sample_child_inv.dta", replace 